WHAT IS CLAIMED IS: 
1 . A method for performing an inverse discrete cosine transform (IDCT) on a 
plurality oi^input coefficients, the method for performing the IDCT comprising: 

performing a first one directional (ID) IDCT resulting in a plurality of first ID 
IDCT coefficients; 



performingva second ID IDCT resulting in a plurality of second ID IDCT 
coefficients; 

performing the^first ID IDCT and the second ID IDCT including performing a 
first plurality of intermediate butterfly computations; and 

rounding and shifting the plurality of second ID IDCT coefficients resulting in 
a plurality of output coefficients. 

2. The method for performing the IDCT on the plurality of input coefficients as 
claimed in claim 1, wherein: 

the step of performing the fir^ plurality of intermediate butterfly computations 
including: 

performing a plurality of intermediate multiplications resulting in a 
plurality of initial products; and 

performing a plurality of intermediate additions. 

3. The method for performing the IDCT Qn the plurality of input coefficients as 
claimed in claim 2, wherein: 

the step of performing a plurality of intenAediate multiplications including: 

multiplying input coefficients by a trigonometric constant producing an 
initial product; and 

maintaining the initial product at no morfe than 16-bits. 

4. The method for performing the IDCT on the pluntf^ty of input coefficients as 
claimed in claim 3, wherein: 

the step of maintaining the initial product at no more than 16-bits including 
shifting the initial product right a plurality of bits resulting in a shifted initial product; 
and 

rounding the shifted initial product utilizing a round near poshjve (RNP) 
rounding scheme. 

5. The method for performing the IDCT on the plurality of input coefficients as 
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^imed in claim 4, wherein: 

the step of performing the first plurality of intermediate butterfly computations 
of the^&st ID EDCT and the second ID IDCT including performing a second plurality 
of intermediate butterfly computations simultaneously in parallel. 
5 6. The method for performing the IDCT on the plurality of input coefficients as 
claimed in clairn^ wherein: 

the step of performing the intermediate butterfly computation of the first ID 
IDCT and the second fD IDCT including performing each intermediate butterfly 
computation in a single instruction. 
10 7. The method for perf^^ning the IDCT on the plurality of input coefficients as 
claimed in claim 3, wherein: 

the step of maintaining thevinitial product at no more than 16-bits including 
rounding the initial product utilizing round near positive (RNP) rounding scheme. 

8. The method for performing the^IDCT on the plurality of input coefficients as 
15 claimed in claim 1, wherein: \^ 

performing the first and second ID IDCT including rounding utilizing a RNP 
rounding scheme and not utilizing a rounding \way from zero (RAZ) rounding 
scheme. 

9. The method for performing the IDCT on tl^ plurality of input coefficients as 
20 claimed in claim 8, wherein: 

the step of rounding and shifting including raiding utilizing a RAZ rounding 
scheme. 

10. The method for performing the IDCT on the plurality of input coefficients as 
claimed in claim 1, wherein: 

25 the step of performing the intermediate butterfly computation of the first ID 

IDCT and the second ID IDCT including performing each intermediate butterfly 
computation in a single instruction. 

1 1 . The method for performing the IDCT on the plurality of input^coefficients as 
claimed in claim 10, wherein: 

30 the step of performing the first plurality of intermediate butterfly Computations 

of the first ID IDCT and the second ID IDCT including performing a second plurality 
of intermediate butterfly computations simultaneously in parallel. 
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12. The method for performing the IDCT on the plurality of input coefficients as 
claimed m claim 1, wherein: 

the\tep of performing the first plurality of intermediate butterfly computations 
including performing each intermediate butterfly computation in a single instruction. 

13. The metftod for performing the IDCT on the plurality of input coefficients as 
claimed in claim 12, wherein: 



the step of performing the first plurality of intermediate butterfly computations 
including performing a second plurality of intermediate butterfly computations 
simultaneously in paralleh 

14. The method for performing the IDCT on the plurality of input coefficients as 
claimed in claim 13, whereinX 

the step of performing aSsecond plurality of intermediate butterfly 
computations simultaneously in parallel including performing at least four 
intermediate butterfly computations\simultaneously in parallel. 

15. The method for performing theTDCT on the plurality of input coefficients as 
claimed in claim 1, wherein: \ 

the step of shifting the input coefficients left a plurality of bits including 
shifting the input coefficients left at least 4-Bits. 

16. The method for performing the IDCT on the plurality of input coefficients as 
claimed in claim 1, further comprising: \ 

loading the input coefficients into at least one register including loading a 
plurality of the input coefficients simultaneously in Wallel and shifting the input 
coefficients left a plurality of bits prior to the step of performing the first ID IDCT. 

17. The method for performing the IDCT on the plurality of input coefficients as 
claimed in claim 16, wherein: \ 

the step of loading a plurality of coefficients simultaneously in parallel 
including loading at least four coefficients simultaneously in parallel. 

18. The method for performing the IDCT on the plurality of\nput coefficients as 
claimed in claim 1, wherein: \ 

the step of shifting the input coefficients left including shifting a plurality of 
the input coefficients left simultaneously in parallel. \ 

19. The method for performing the IDCT on the plurality of input coefficients as 
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ml 8, wherein: 



^aimed in claim 18, wherein: 

the step of shifting a plurality of the coefficients left simultaneously including 
shiftihg at least four coefficients simultaneously in parallel. 

20. A method for performing an inverse discrete cosine transform (IDCT) on a 
plurality of inputcoefficients, the method for performing the IDCT comprising: 

performing^first one directional (ID) IDCT resulting in a plurality of first ID 
IDCT coefficients including utilizing a round-near-positive (KNP) rounding scheme; 

performing a second ID IDCT resulting in a plurality of second ID IDCT 
coefficients including utilizing a round-near-positive (RNP) rounding scheme; and 

rounding and shifting the plurality of second ID IDCT coefficients resulting in 
a plurality of output coefficientsMncluding rounding utilizing a round away from zero 
(RAZ) rounding scheme. \^ 

21. The method for performing the IDCT as claimed in claim 20, wherein: 
the step of rounding and shifting^ncluding rounding utilizing the RAZ 

rounding scheme including: \^ 

shifting the second ID IDCT final coefficient right a plurality of bits 
resulting in a shifted final coefficient; 

adding a conditional constant with the shifted final coefficient resulting 
in a conditional product; \ 

adding the second ID IDCT final coefficient with the conditional 
product resulting in a compensated final product; and \ 

shifting the compensated final product right \ plurality of bits. 

22. The method for performing the IDCT as claimed in claim 2 1 , wherein: 

the step of shifting the second ID IDCT final coefficientancluding shifting the 
second ID IDCT final coefficient right at least 15-bits. \ 

23. The method for performing the IDCT as claimed in claim 2 K wherein: 
the step of adding the conditional constant including: \ 

adding 32 if the second ID IDCT final coefficient is positive; and 
adding 31 if the second ID IDCT final coefficient is negative. 

24. The method for performing the IDCT as claimed in claim 2 1 , wherem: 

the step of shifting the compensated final product left including shifting, the 
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ompensated final product right at least 6-bits. 

25. \ The method for performing the IDCT as claimed in claim 21, wherein: 

the step of rounding and shifting including performing the step of rounding and 
shifting infour instructions. 
5 26. The method for performing the IDCT as claimed in claim 25, wherein: 

performing^ plurality of the steps of rounding and shifting simultaneously in 
parallel. 

27. The method for^erforming the IDCT as claimed in claim 20, wherein: 
the step of rounding and shifting including performing at least four of the steps 

10 of rounding and shifting simultaneously in parallel. 

28. The method for performing the IDCT as claimed in claim 20, further 
comprising: 

transposing the first ID iS^CT coefficients prior to performing the second ID 
IDCT; and 

1 5 transposing the IDCT output (^efficients resulting in final IDCT outputs 

coefficients. 

29. The method for performing the ID'CT as claimed in claim 28, further 
comprising: 

the step of transposing the first ID IDOT coefficients and the IDCT output 
20 coefficients including implementing a shuffle a mstruction. 

30. The method for performing the IDCT as claimed in claim 28, further 
comprising: 

clipping the final IDCT outputs coefficients. 

25 31. A method for decompressing compressed data having a plurality of input 
coefficients, comprising: 

performing a first one directional (ID) IDCT and a secoJui ID IDCT on the 
plurality of input coefficients resulting in output coefficients including: 
utilizing a round near positive (RNP) rounding scher 
30 not utilizing a round away from zero (RAZ) rounding scheme; and 

rounding and shifting the output coefficients including utilizing t^e RAZ 
rounding scheme. 
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32. The metfiou for decompressing compressed data as claimed in claim 31, 
wherein: 

the IDCT is performed in less than 397 cycles. 

33. The method for decompressing compressed data as claimed in claim 32, 
wherein: 

complying with an Institute of Electrical and Electronics Engineers (IEEE) 
1180 accuracy standard. 

34. The method for decompressing compressed data as claimed in claim 33, 
wherein: 

\ 

implementing the ID£T utilizing single instruction multiple data instructions 
(SIMD). 

35. The method for decompressing compressed data as claimed in claim 34, 
wherein: 

performing at least four SIMD. instructions simultaneously in parallel. 

36. The method for decompressing compressed data as claimed in claim 3 1 , 
wherein: 

performing the first ID IDCT and the'second ID IDCT such that four 
coefficients are operated on simultaneously in parallel. 



37. An apparatus for decompressing a compressed data signal, comprising: 

a means for loading a plurality of input coefficients into at least one register; 
a means for shifting the input coefficients a plurality of bits coupled with the 

register configured to receive the input coefficients and produce shifted input 

coefficients; 

a means for performing a first one directional (ID) Inverse Discrete Cosine 
Transform (IDCT) coupled with the means for shifting the inputvcoefficients 
configured to receive the shifted coefficients and produce a first 11^ IDCT output 
matrix; 

a means for transposing the first ID IDCT output matrix coup fed with the 
means for performing the first IDCT configured to transpose the first 1D\JDCT output 
matrix and to produce a first transposed IDCT output matrix; 

a means for performing a second ID IDCT on the transposed IDCT oiltput 
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latrix coupled with the means for transposing the first IDCT output matrix configured 
to\eceive the transposed first IDCT output matrix and to produce a second IDCT 
output matrix; 

aymeans for rounding away from zero (RAZ) and shifting coupled with the 
means for performing the second ID IDCT configured to round and shift coefficients 
of the second\lD IDCT output matrix to produce rounded second ID EDCT output 
matrix; and 

a means fo^transposing the rounded second ID IDCT output matrix coupled 

with the means for RAZ and shifting configured to transpose the rounded second ID 
\ 

IDCT output matrix toyproduce a decompressed output. 



\ 



\ 

38. The apparatus for decompressing a compressed data signal as claimed in claim 
37, further comprising: \^ 

a microprocessor including parallel processing, multimedia applications, at 

least one register, the means for loading a plurality of input coefficients, the means for 

\ 

shifting the input coefficients, the means for performing a first ID IDCT, the means 

\ 

for transposing the first ID IDCT, the^means for performing the second ID IDCT, the 
means for RAZ and shifting, the meankfor transposing the rounded second ID IDCT 
output matrix; and \^ 

the microprocessor configured to perform at least one single instruction 
multiple data (SIMD) instruction on a plurality of coefficients simultaneously in 
parallel. 



39. A computer program product for providing^he decompression of a compressed 
signal, the computer program product including a computer readable storage medium 
an a computer program mechanism embedded therein,\he computer program 
mechanism comprising: 

a method of performing an Inverse Discrete Cosine'Transform (IDCT) 
comprising: 

loading a plurality of input coefficients into at leafed one register; 
shifting the input coefficients left a plurality of bits, 
performing a first one directional (ID) Inverse Discrete^osine 
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jsform (IDCTjincluding utilizing a round near positive (RNP) rounding scheme 
producing^ first EDCT output matrix; 

isposing the first IDCT output matrix producing a transposed IDCT 

output matrix; 

performing a sfecmid ID IDCT on the transposed IDCT output matrix 
including utilizing a RNP rounding^heme producing a second IDCT output matrix 
including a plurality of components; 

rounding away from zero and shifting each of the components of the 
second IDCT output matrix producing a rounded IDGT output matrix; and 

transposing the rounded IDCT output matrix producing a decompressed 

output. 
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